SCFG latent annotation for machine translation
نویسندگان
چکیده
We discuss learning latent annotations for synchronous context-free grammars (SCFG) for the purpose of improving machine translation. We show that learning annotations for nonterminals results in not only more accurate translation, but also faster SCFG decoding.
منابع مشابه
Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions
In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical str...
متن کاملBetter Synchronous Binarization for Machine Translation
Binarization of Synchronous Context Free Grammars (SCFG) is essential for achieving polynomial time complexity of decoding for SCFG parsing based machine translation systems. In this paper, we first investigate the excess edge competition issue caused by a leftheavy binary SCFG derived with the method of Zhang et al. (2006). Then we propose a new binarization method to mitigate the problem by e...
متن کاملUtilizing Target-Side Semantic Role Labels to Assist Hierarchical Phrase-based Machine Translation
In this paper we present a novel approach of utilizing Semantic Role Labeling (SRL) information to improve Hierarchical Phrasebased Machine Translation. We propose an algorithm to extract SRL-aware Synchronous Context-Free Grammar (SCFG) rules. Conventional Hiero-style SCFG rules will also be extracted in the same framework. Special conversion rules are applied to ensure that when SRL-aware SCF...
متن کاملAutomatically Improved Category Labels for Syntax-Based Statistical Machine Translation
A common modeling choice in syntax-based statistical machine translation is the use of synchronous context-free grammars, or SCFGs. When training a translation model in a supervised setting, an SCFG is extracted from parallel text that has been statistically word-aligned and parsed by monolingual statistical parsers. However, the set of syntactic category labels used in a monolingual statistica...
متن کاملUsing Features from Topic Models to Alleviate Over-Generation in Hierarchical Phrase-Based Translation
In hierarchical phrase-based translation systems, the grammars (SCFG rules) have over-generation problem because we can replace the non-terminalX with almost everything without knowing the syntactic or semantic role ofX . In this paper, we present an approach that uses topic models to learn the distributions for non-terminals in each SCFG rule, based on which we further derive static features f...
متن کامل